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ABSTRACT 


The  hardware  of  floating-point  MULTIPLY,  ADD,  and  SUBTRACT  units 
are  designed  to  support  the  multiplication,  addition,  and  subtraction 
operation  necessary  in  the  Fast  Fourier  Transform  (FFT). 

In  this  thesis,  the  IEEE  floating-point  standard  is  adopted  and  scaled 
down  to  16  bits,  but  the  exponent  is  an  excess-8  number  represented  using 
radix-2.  A  16  bit  reduced  word  size  floating-point  arithmetic  unit  for  high 
speed  signal  analysis  was  implemented.  The  layout  verification,  functional 
simulation,  and  timing  analysis  of  these  units  have  been  performed  on  the 
Genesil  Silicon  Compiler  (GSC)  system  that  was  developed  to  overcome  the 
shortcomings  of  the  time  consuming  custom  layout  methods.  The  design  of 
this  thesis  work  can  be  used  for  further  investigation  of  the  high  speed, 
pipelined  floating-point  arithmetic  units. 
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I.  INTRODUCTION 


A.  BACKGROUND 

In  this  section,  the  basic  concepts  of  The  Fast  Fourier  Transform  and  The 
Genesil  Silicon  Compilier  System  are  introduced. 

1.  The  Fast  Fourier  Transform 

For  some  applications  (e.g.,  speech,  radar  signal,...,  etc),  analysis  in 
the  frequency  domain  is  simpler  than  that  in  the  time  domain.  In  signal 
processing  and  spectral  analysis  the  frequency  domain  is  used.  The  fast 
fourier  transform  (FFT)  is  used  to  transform  data  from  the  time  domain  into 
the  frequency  domain.  Availability  of  special-purpose  hardware  in  both  areas 
has  led  to  sophisticated  signal-processing  systems  based  on  the  features  of  the 
FFT. 

The  popularity  of  the  FFT  is  evidenced  by  the  wide  areas  of 
application.  In  addition  to  conventional  radar,  communications,  sonar,  and 
speech  signal-processing  applications,  current  applications  of  FFT  usage 
include  biomedical  engineering,  imaging,  metallurgical  analysis,  mechanical 
analysis,  geophysical  analysis,  simulation,  music  synthesis,  and  the 
determination  of  weight  variation  in  the  production  of  paper  from  pulp  [Ref. 
1].  Note  that  The  radar  and  communication  signal  processing  demands  high 
speed  FFT  computation. 

2.  The  Genesil  Silicon  Compiler  System 

The  Genesil  Silicon  Compiler  (GSC)  system  is  design  automation 
software  system  that  allows  systems  engineers  and  circuit  designers  to  design 
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complex  Very  Large  Scale  Integration  (VLSI)  computer  chips.  GENESIL 
produces  Integrated  Circuited  (IC)  designs  from  architectural  descriptions  and 
allows  for  their  verification. 

The  GSC  is  based  on  cin  object-oriented  hierarchical  system  running 
under  the  UNIX  operating  system.  The  objects  consists  of  Blocks,  Modules, 
Chips,  and  Chip-sets.  The  GSC  allows  a  designer  to  easily  create  complex 
circuits  using  devices  from  the  compiler  library.  This  design  procedure  does 
not  require  the  expertise  and  tedium  involved  with  design  at  the  transistor 
level  [Ref.  2.]. 

The  advantage  that  a  silicon-compiler-based  process  has  over  a 
custom  IC  system  design  process  is  that  the  latter  requires  a  team  of  experts  in 
the  fields  of  logic  implementation,  circuit  simulation,  chip  layout,  and 
testing.  However,  the  design  process  based  on  the  silicon  compiler  may  be 
accomplished  by  the  individual  utilizing  a  top-down  hierarchical  design 
methodology  beginning  with  a  partitioned  chip  set,  progressing  downward 
into  individual  chips  and  modules,  and  terminating  at  the  block  level.  There 
is  far  less  time  required  to  design  an  IC  using  a  silicon  compiler  than  for  a  full 
custom  manual  CAD  method  using  graphic  layout  tools.  This  is  especially 
attractive  for  military  applications  where  small  numbers  of  special  purpose 
chips  are  required  and  a  rapid  turnaround  time  is  desired.  Thus,  one  can  see 
that  the  silicon  compiler  provides  a  streairdined  method  for  rapid 
development  of  IC  products.  The  disadvantages,  however,  of  the  silicon 
compiler  are  that  the  resulting  circuit  is  often  slower  and  the  layout  is  not 
always  efficient  in  its  use  of  silicon  area  (Ref.  3]. 
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B.  THESIS  GOALS  AND  ORGANIZATION 

The  main  goal  of  this  thesis  was  to  design  a  floating-point  FFT  unit  using 
the  GENESIL  Silicon  Compiler.  The  motivation  for  this  thesis  was  to  explore 
the  feasibility  of  designing  the  FFT  with  floating-point  arithmetic  units  using 
state-of-the-art  VLSI  circuit  design  tools.  The  investigation  was  thorough  and 
the  basic  design  was  developed.  The  floating-point  number  system  and  FFT 
structure  will  be  introduced  In  Chapter  II.  The  design  process  of  floating¬ 
point  multiplication  and  addition  will  be  discussed  in  Chapter  HI.  The  results 
of  the  simulation  of  floating-point  multiplication,  addition,  and  FFT  will  be 
presented  in  Chapter  IV.  Chapter  V  includes  the  conclusions  of  this  thesis 
and  the  recommendations  for  future  investigation. 
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II.  FLOATING-POINT  FFT 


In  this  chapter,  we  will  introduce  the  floating-point  number  system  in 
section  A,  the  floating-point  arithmetic  in  section  B,  and  the  FFT  structure  in 
section  C.  A  detailed  description  of  the  hardware  design  will  be  discussed  in 
Chapter  III,  and  some  examples,  which  include  the  special  cases,  will  be 
presented  in  Chapter  IV. 

A.  FLOATING-POINT  NUMBER  SYSTEM 

There  are  four  binary  number  systems  most  commonly  considered  for 
use  in  fixed  point  arithmetic  operations  [Ref.  4]: 

•  sign-magnitude, 

•  ones'  complement, 

•  two's  complement,  and 

•  excess  2  "'■l 

Traditionally,  the  excess  number  system  is  used  to  represent  the  exponen 
of  floating-point  number.  The  range  of  excess  representation  is  exactly  the 
same  as  that  of  two's  complement  numbers.  In  fact,  the  representation  for 
the  sign  bits,  the  excess  and  two's  complement  number  systems  are  always 
opposite.  A  4-bit  comparison  table  of  two's  complement  and  excess 
representation  is  shown  in  the  Table  2.1. 

The  sign-magnitude  system  uses  the  most  significant  bit  to  represent  the 
sign  of  the  number;  all  other  bits  represent  the  magnitude.  It  is  possible  to 
use  any  of  the  fixed  point  representation  schemes  for  the  fracton  of  a  floating¬ 
point  number.  Each  has  its  own  advantages  and  disadvantages.  In  this  thesis. 
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TABLE  2.1.  A  4-BIT  COMPARISON  TABLE  OF  TWO'S 

COMPLEMENT  AND  EXCESS-8  REPRESENTATION 


Two's 

Complement 

value 

Excess  -  8 
value 

0000 

0 

-8 

0001 

1 

-7 

0010 

2 

-6 

0011 

3 

-5 

0100 

4 

-4 

0101 

5 

-3 

0110 

6 

-2 

0111 

7 

-1 

1000 

-8 

0 

1001 

-7 

1 

1010 

-6 

2 

1011 

-5 

3 

1100 

-4 

4 

1101 

-3 

5 

1110 

-2 

6 

1111 

-1 

7 

the  sign  and  fraction  of  floating-point  number  are  described  by  the  sign- 
magnitude  system. 

The  single-precision  arithmetic  refers  to  those  operations  defined  over 
standfird  data  operands  with  word  length  equal  to  that  of  one  memory  word. 
The  word  length,  4,  8, 12, 16,...,  32,...,  could  be  used,  because  of  typical  memory 
chip  sizes.  But  the  fraction  part  (including  sign  bit)  of  the  word  should  not  be 
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less  than  12  bits;  in  fact,  the  12  bits’  quantization  is  often  too  coarse  for  signal 
spectral  analysis.  However,  the  goal  of  this  thesis  is  to  design  a  high  speed 
arithmetic  chip,  and  reducing  the  word  size  increases  arithmetic  speed. 

Therefore,  we  use  the  minimal  word  length,  16  bits,  to  represent  the 
floating-point  numbers  as  follows: 

•  1  bit  for  the  sign, 

•  4  bits  for  the  exponent,  and 

•  11  bits  for  the  fraction. 

The  IEEE  floating-point  standard  is  adopted  and  scaled  down  to  16  bits,  but 
the  exponent  is  an  excess-8  number  represented  using  radix-2.  The  fraction 
represents  a  number  less  than  one,  but  the  significant  of  the  floating-point 
number  is  one  plus  the  fraction  part.  In  other  words,  if  e  is  the  value  of  the 
exponent  and  /  is  the  value  of  the  fraction,  the  magnitude  being  represented 
is  l.f  x  2®'8.  The  fractional  part  of  a  floating-point  number  must  not  be 
confused  with  the  significant,  which  is  one  plus  the  fractional  part.  The 
leading  1  in  the  significant  l.f  does  not  appear  in  the  representation;  that  is, 
the  leading  bit  is  implicit.  This  is  often  referred  to  as  the  hidden  one 
technique.  When  performing  arithmetic  on  numbers,  the  fraction  part 
normally  needs  to  be  made  explicit  [Ref.  5].  The  representable  range  of  the 
floating-point  numbers  is  shown  in  Figure  2.1. 

B.  FLOATING-POINT  ARITHMETIC 

Differences  between  the  floating-point  arithmetic  and  the  integer 
arithmetic  are  as  follows:  an  exponent  field  must  be  manipulated,  in  addition 
to  the  fraction  field,  and  the  result  of  a  floating-point  operation  usually  has  to 
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zero 


Figure  2.1  Ranges  of  16  Bit  Numbers  in  Floating  Point  Representations 

be  rounded  to  be  represented  by  another  floating-point  number  of  the  same 
precision. 

1.  Floating-point  Multiplication 

Floating-point  multiplication  is  perhaps  the  simplest  floating-point 
operation  in  terms  of  the  required  hardware.  No  alignment  of  the  operands 
are  required  before  initiating  the  operation,  and  minimal  normalization  is 
required  at  the  end  of  the  operation. 

The  logic  of  floating-point  multiplication  of  two  numbers  is  shown 
in  Figure  2.2  [Ref.  6].  The  fraction  of  the  result  is  the  product  of  the  multiplier 
and  multiplicand  fractions;  the  exponent  of  the  result  is  the  sum  of  the 
exponents  of  the  input  operands;  the  sign  bit  of  the  result  is  simply  the 
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Exponent 

Add 


Miltiply 


Result  Sign 


Result  Exponent  Result  Fraction 


Figure  2.2  Block  Diagram  for  Floating  Point  Multiply 
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"exclusive-or"  of  the  sign  bit  of  the  operands.  Note  that  the  above  three 
operations  can  be  performed  simultaneously;  they  do  not  depend  on  each 
other. 

In  any  floating-point  operation,  the  normalization  process  is 
necessary  for  producing  the  correct  result.  It  has  been  observed  that  the 
product  of  two  normalized  floating-point  numbers  may  have  one  of  the  three 
possibilities  in  the  value  left  to  the  decimal  point:  11,  10,  01.  This  can  be 
shown  as  follows. 


1.  AAAA. 
x)  1.  BBBBB 


ll.fffff., 
or  lO.fffff. 
or  Ol.fffff. 


The  result  of  either  11  or  10  indicates  an  overflow  due  to  a  carry  and 
the  product  must  be  normalized  (shifted  to  the  right  one  bit).  There  is  no 
need  to  normalize  a  product  if  the  result  is  01  (i.e.,  no  carry  has  occurred). 
Note  that  when  the  normalization  takes  place  the  value  in  the  exponent  field 
must  be  adjusted  to  reflect  the  correct  value.  For  example,  after 
normalization,  the  value  (11.1000)2  x  2^  is  equal  to  (1.1100)2  x  2^. 

2.  Floating-Point  Addition 

Compared  to  the  multiplication,  floating-point  addition  is  a  much 
more  complex  operation.  This  is  because  that  addition  requires  both  numbers 
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to  have  the  same  exponent,  in  order  to  have  proper  alignment  of  the 
fractional  portion  of  the  number.  The  sign  bit  also  must  be  tested  to  ensure 
the  proper  result,  and  not  just  an  absolute  value  determination  made.  Figure 
2.3  shows  a  block  diagram  for  the  addition  process  [Ref.  6]. 

Before  the  two  floating-point  numbers  can  be  properly  added 
together,  the  fraction  must  be  aligned  (unless  the  exponents  of  the  two 
numbers  are  the  same).  This  involves  determining  which  operand  is 
smaller,  and  then  aligning  the  fraction  of  that  operand  appropriately  with  the 
fraction  of  the  larger  operand.  The  alignment  is  accomplished  by  shifting  the 
fraction  of  the  smaller  operand  a  number  of  positions  to  the  right,  therefore 
making  the  digits  of  the  smaller  operand  line  up  with  the  digits  of  the  same 
significance  in  the  larger  operand.  The  amoimt  of  the  alignment,  the  number 
of  positions  to  shift,  is  determined  by  the  difference  in  the  exponents.  The 
addition  element  then  receives  the  fraction  directly  from  the  larger  operand, 
and  the  aligned  fraction  from  the  smaller  operand.  The  resulting  number  is 
then  provided  to  the  POST  NORMALIZATION  unit.  The  post  normalization 
unit  must  be  capable  of  a  shift  of  at  least  one  position  to  lesser  significance, 
and  must  also  be  capable  of  shifts  of  many  positions  to  higher  significance. 
Note  that  this  post  normalization  must  then  be  capable  of  adjusting  the  size 
of  the  exponent  to  reflect  any  normalization.  At  the  end  of  this  process,  the 
result  will  have  been  properly  formed  and  ready  for  any  additional  operation 
required  of  it.  [Ref.  6] 

The  floating-point  addition  also  includes  subtraction,  since  the  sign- 
magnitude  method  of  storing  information  necessitates  that  the  hardware  be 
capable  of  both.  [Ref.  6] 
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Result  Exponent  Result  Fraction 


Figure  2.3  Block  Diagram  for  Floating  Point  Addition 
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3.  Overflow,  Underflow,  and  Extra  bits 

Overflow  occurs  when  an  operation  produces  a  result  that  has 
exceeded  the  ability  of  the  system  to  represent  information.  This  can  occur 
when  multiplying  two  numbers  whose  exponents  sum  up  to  an  exponent  not 
representable  in  the  system,  or  adding  two  numbers  that  already  are  the 
maximum  representable  numbers  by  the  system.  Overflow  can  also  be  caused 
by  a  post  normalization  (e.g.,  when  the  exponent  is  1111  and  after 
normalization  has  occurred,  the  resulting  exponent  is  0000  which  does  not 
represent  the  vcdue  desired. 

Underflow  occurs  when  an  operation  produces  a  result  that  is  too 
small  to  represented  in  the  number  system.  This  can  occur  in  the 
multiplication  process  when  two  negative  exponents  are  added  and  the 
resulting  exponent  can  not  be  represented  in  the  system  (e.g.,  if  one  exponent 
is  0001  (2*7)  and  the  other  exponent  is  0100  (2*4),  then  the  resulting  exponent 
is  2*^^  that  can  not  be  represented  in  this  number  system). 

The  floating-point  operations  of  multiplication,  addition,  and 
subtraction  may  increase  the  number  of  bits  in  the  fraction  beyond  the 
number  we  can  actually  store.  There  are  many  ways  to  deal  with  these  extra 
bits.  The  obvious  and  simplest  method  is  merely  to  ignore  them;  this  is 
called  truncation.  The  unwanted  bits  are  dropped  from  the  result.  This 
results  in  a  bias;  the  magnitude  of  the  number  to  be  stored  is  smaller  than  the 
true  value.  The  VLSI  cells  developed  in  this  thesis  use  the  simple  truncation 
technique.  The  designs  are  easily  modified  for  rounding  or  jamming.  Other 
more  complex  techniques  (e.g.,  zero  bias  rounding  and  ROM  rounding  [Ref. 
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6])  can  be  used  with  more  modifications,  but  increase  delay  of  the  unit  and 
increase  its  size. 

C  FFT  STRUCTURE 

For  a  Discrete  Fourier  Transform  (DFT)  calculation,  the  operations  are 
performed  on  a  sequence  of  data.  Assume  that  the  total  number  of  input  data 
is  N,  which  is  a  power  of  two.  For  a  finite-duration  sequence  x(n),  the  DFT 
formula  is  defined  as: 


N-l 

X(k)  =  ^x(n)e^  for  k  =  0, 1, ....  N-l 

n=0 

The  FFT  terminology  refers  to  methods  for  fast  computation  of  the  DFT 
[Ref.  7].  In  the  following,  a  brief  description  of  two  data  flow  designs  of  Fast 
Fourier  Transform  are  presented.  They  are  the  methods  of  Decimation  In 
Time  and  Decimation  In  Frequency  [Ref.  7}. 

1.  Decimation  In  Time  (DIT) 

This  method  divides  x(n)  into  two  halves:  one  with  odd  sequence 
numbers,  and  the  other  with  even  sequence  numbers  Through  the  well- 
known  butterfly  operation  graph  for  the  DIT,  the  fast  fourier  transform  can  be 
represented  by  Figure  2.4.  Where  is  known  as  the  weighting  factor,  and  is 
related  to  the  term  in  the  DFT. 

2.  Decimation  In  Frequency  (DIF) 

Another  equivalent  way  to  decompose  the  calculation  of  the  DFT  is 
known  as  the  decimation  in  frequency.  The  signal  flow  of  the  butterfly  is 
shown  in  Figure  2.5. 
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Figure  2.4  Signal  Flow  Graph  of  DIT  Butterfly 


Figure  2,5  Signal  Flow  Graph  of  DIF  Butterfly 

A  radix-2  FFT  (two-point  DFT),  either  DIT  or  DIF  requires  one 
complex  multiply  and  two  complex  additions.  These  are  equivalent  to  four 
multiplications  and  six  additions  of  real  numbers. 


14 


in.  THE  HARDWARE  DESIGN  OF  FLOATING-POINT 
MULTIPLY  UNIT  AND  ADD  UNIT 


The  algorithm  for  floating-point  multiplication  and  addition  has  been 
described  in  Chapter  II.  The  hardware  configuration  of  the  floating-point 
MULTIPLY  unit  and  ADD  imit  is  described  in  section  A  and  section  B.  These 
two  units  can  be  used  as  the  basic  hardware  components  for  illustrating  the 
execution  of  standard  floating-point  operations.  The  hardware  of  the 
floating-point  SUBTRACT  unit  will  be  discussed  in  section  C,  and  the  the 
hardware  of  the  floating-point  FFT  unit  will  be  presented  in  section  D. 

A.  THE  HARDWARE  DESIGN  OF  FLOATING-POINT  MULTIPLY  UNIT 
The  floating-point  multiply  unit  includes  adders,  a  parallel  multiplier, 

multiplexers  (MUXs),  and  logic  gates.  Figure  3.1.  shows  the  block  diagram  of 
floating-point  multiply  unit.  The  two  input  operands  to  be  specified  as  A  and 

B,  and  the  result  will  be  W.  Thus,  within  the  floating-point  unit,  the  operand 
A  will  be  made  up  of  the  catenating  of  its  three  components, 

A  =  as,  ae,  am 

the  1  bit  sign,  as,  the  4-bit  exponent,  ae,  and  the  11-bit  fraction,  am. 

There  are  seven  major  steps  which  must  be  executed  in  order  to  complete 
the  multiplication  of  two  16  bit  floating-point  numbers. 

1.  Check  for  zero  operands 

As  we  mentioned  in  Chapter  II  that  the  floating-point  format  is  the 
IEEE  floating-point  standard  and  scaled  down  to  16  bits,  and  the  exponent  is  a 
excess-8  number  represented  using  radix-2.  With  this  format  the  smallest 
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Figure  3.1  The  Block  Diagram  of  Floating-point  Multiply  Unit 
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number  that  can  be  represented  is  (1.000...)2  x  2*7.  Thus,  a  number  that  is 
between  [-Vmin  ~  +Vmin]  will  be  treated  as  zero.  Therefore,  when  the 
exponent  of  either  the  multiplier  or  the  multiplicand  is  2*8  (i.e.,  0000)  the 
product  should  be  zero.  To  test  if  one  or  both  of  the  operands  are  zero,  we  can 
use  two  NOR  gates  and  one  OR  gate  to  implement.  This  testing  result  will  be 
used  to  select  the  final  product,  which  is  shown  in  Figure  3.1. 

2.  Set  the  product  sign 

In  our  floating-point  number  system,  a  negative  number  has  its  sign 
bit  as  1.  The  sign  portion  of  a  floating-point  multiplier  can  be  produced  with 
an  XOR  gate.  The  sign  bit  is  a  "1”  only  when  the  signs  of  the  multiplier  and 
the  multiplicand  are  different. 

3.  Exponent  addition 

Refer  to  Figure  3.1,  we  can  accomplish  the  exponent  addition  with 
the  logic  block  A  and  two  adders.  Having  observed  in  Table  2.1  on  page  5,  the 
excess-8  number  representation,  we  notice  that  the  most  significant  bit  (MSB) 
in  the  exponent  addition  can  be  represented  as: 

S[14]  =  a[14]b[14l  +  all4]Cout  +  b[14]Cout 

This  can  be  explained  as  follows.  In  Figure  3.1  Cout  is  the  carry-out  of  the 
Adder_0.  There  are  three  possibilities  when  adding  two  exponents:  both 
positive,  both  negative,  and  opposite  in  signs.  Refer  to  Table  2.1,  the  MSB  of 
the  exponent  of  an  excess-8  number  is  1  (0)  when  it  is  a  positive  (negative). 
Therefore,  when  both  a[14]  and  b[141  are  positive  (negative),  the  sign  bit 
should  be  1,  when  a[14]  and  b[14]  are  opposite  in  signs  the  resulting  sign 
depends  on  the  Cout-  The  truth  table,  Karnaugh  map,  and  the  gate  structure 
are  shown  in  the  Figure  3.2. 
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Figvue  3.2  The  Logic  Design  of  Block  A  in  MULTIPLY  Unit 


4.  Fraction  multiplication 

The  fraction  multiplication  can  be  implemented  by  one  parallel 
multiplier  and  one  adder  (Adder_2).  Note  that  the  MSBs  of  the  two  input 
operands  of  the  parallel  multiplier  are  the  hidden  ones. 

5.  Normalization 

As  described  in  Chapter  II,  the  fraction  multiplication  may  have 
three  possible  integer  parts  (two  digits  to  the  left  of  the  binary  point):  11,  10, 
and  01.  If  the  integer  part  is  11  or  10  (i.e.,  x[ll]  =  1),  we  normalize  by  taking  11 
bits  (x[10  :  0])  as  the  significant  and  increment  the  exponent  sum  by  1.  If  the 
integer  part  is  01,  we  simply  discard  the  leading  0  of  the  significant  and  take 
the  11  bits  (x[9  :  0],  L[ll])  as  the  fraction  output  [Ref.  8].  Note  that  the  hidden  1 
(x[ll]  or  x[10])  is  implied. 

6.  Force  to  zero 

Figure  2.1  on  page  7  shows  the  range  of  zero.  The  normalization, 
after  significant  multiplication,  could  adjust  the  product  exponent  into  the 
representable  range  (e.g.,  the  product  exponent  of  0100  (i.e.,  2'4)  and  0100  (i.e., 
2-4)  is  0000  (i.e.,  2-8),  and  the  integer  part  of  the  product  fraction  is  11  or  10; 
after  the  normalization,  the  product  exponent  is  adjusted  to  0001  (i.e.,  2-^)).  In 
the  other  words,  the  result  of  exponent  additiv  n  is  0000  (i.e.,  2-8)  and  the  carry¬ 
out  (x[ll])  of  fraction  multiplication  is  zero  that  will  force  the  result  of  this 
multiplication  to  zero.  Figure  3.1  shows  the  output  of  the  NOR  gate  and  bit 
x[ll]  select  the  output  (S[10  :  0])  of  MUX_0  is  zero,  shifting  the  significant  one 
place  to  the  right,  or  simply  discard  the  leading  0  of  the  significant. 
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7.  Overflow  and  underflow 

Using  Table  2.1  and  Figure  2.1,  we  know  that  overflow  can  occur 
when  a[14],  b[14]  and  Cout  are  ones.  Since  the  shifting  for  normalization  can 
never  be  more  than  one  digit,  so  overflow  can  also  occur  when  the  result  of 
exponent  addition  is  1111  (i.e.,  2+^)  and  the  carry-out,  x[ll],  of  Adder_2  is  "1" 
(i.e.,  2+8  can  not  be  represented  in  Figure  2.1).  Underflow  can  occur  when 
a[14],  b[14]  and  Cout  are  zeros.  Note  that  a[14]  and  b[14]  are  the  MSBs  of  the 
exponent  part  of  two  operands.  Three  AND  gates,  three  inverters,  and  one 
OR  gate  show  the  overflow  flag  (ov)  and  the  imderflow  flag  (im)  in  Figure  3.1. 

B.  THE  HARDWARE  DESIGN  OF  FLOATING-POINT  ADD  UNIT 

The  floating-point  ADD  unit  includes  adders,  multiplexers, 
programmable  logic  array  (PLA),  shifter,  and  logic  gates.  Figure  3.3  shows  the 
logic  blocks  diagram  of  floating-point  ADD  unit.  As  described  in  Chapter  II, 
the  floating-point  addition  is  more  complicated  than  the  floating-point 
multiplication.  The  reason  for  this  is  the  alignment  of  operands  required 
before  initiating  the  floating-point  addition  and  normalization  is  required  at 
the  end  of  the  transaction. 

There  are  six  major  steps  which  must  be  executed  in  order  to  complete 
the  addition  of  two  16  bit  floating-point  numbers. 

1.  Align  the  fraction  by  equalizing  their  exponents 
The  comparison  of  the  exponents  is  realized  with  two's  complement 
subtraction.  Therefore,  a  two's  complement  converter  is  needed,  and  shown 
in  Figure  3.4  [Ref.  12].  Two  adders  (Adder_0,  Adder_l)  compare  the  two 
exponents,  ae,  and  he.  There  are  three  possibilities,  ae  >  he,  ae  =  he,  and  ae  < 
he.  If  the  result  is  either  ae  >  he  (i.e.,  Ae[3]  =  0)  or  ae  <  he  (i.e.,  Ae[3]  =  1),  then 
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ae[3:0]  be[3:0] 


Figure  3.3.a  The  Block  Diagram  of  Floating-point  Add  Unit 


21 


Figure  3.4  Two's  Complement  Converter 

the  smaller  exponent  must  be  incremented  to  match  the  magnitude  of  the 
larger  exponent.  Since  the  number  system  is  based  on  two,  the  alignment 
network  must  be  capable  of  shifting  any  number  of  bits,  from  zero  to  twelve. 
Comparison  of  the  two  exponents  provides  a  binary  number  (SEI3  :  0]  in 
Figure  3.3.a)  which  indicates  how  far  the  smaller  exponent  needs  to  be  shifted 
to  complete  the  alignment  process.  Design  of  the  network  used  to  align  the 
smaller  fraction  to  be  added  to  the  larger  fraction  is  in  Figure  2.3  on  page  11. 
Figure  3.5  shows  one  way  of  accomplishing  this  is  to  use  four  2-1 
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multiplexers.  The  MSB  (SE[3])  of  this  number  is  then  used  by  the  first  level  of 
MUXs  to  shift  the  number  by  eight  bits  (the  1  condition),  or  provide  no  shift 
at  all  (the  0  condition).  Similarly,  the  second  MSB  (SE[2])  of  the  number  is 
used  by  the  second  set  of  MUXs  to  shift  the  number  provided  by  the  first  set  of 
MUXs  by  four  (the  1  condition)  or  provide  no  shift  at  all  (the  0  condition). 
This  process  continues,  with  each  level  of  multiplexers  shifting  the  number 
by  some  power  of  two,  until  all  four  bits  have  been  aligned.  For  example,  if 
SE[3  :  0]  =  0110,  then  the  smaller  fraction  will  be  shifted  to  the  right  6  bits. 

2.  Add/subtract  the  fraction 

The  result  of  the  comparison  (ae  and  be)  directs  the  MUX_2  to  select 
the  unaligned  fraction  (larger  exponent),  and  the  same  signal  directs  the 
MUX_1  to  select  the  other  fraction  (smaller  exponent)  and  align  it  by  shifting 
the  appropriate  number  of  positions.  These  two  results,  one  unaligned 
fraction  and  one  aligned  fraction,  are  then  fed  to  the  Adder_2  for  the  actual 
calculation.  Since  both  addition  and  subtraction  must  be  accommodated,  the 
use  of  two's  complement  arithmetic  is  appropriate.  Which  will  require 
conversion  of  operands  from  sign-magnitude  to  two's  complement  form. 
For  either  addition  or  subtraction,  a  negative  operand  (an  addend  or  the 
minuend)  is  converted  to  two's  complement  form  by  complementing  the 
significant.  For  addition,  the  second  operand  is  similarly  complemented  if  it 
is  negative.  For  a  negative  number,  all  that  is  needed  is  to  change  the  sign  bit; 
for  a  positive  number,  both  the  sign  bit  and  the  significant  must  be 
complemented.  The  two  sign  bits  (abs,  Bb[12])  select  this  operation  for  true 
addition  or  true  subtraction.  Note  that  the  ones'  converter  and  the  XOR  gate 
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Figure  3.5  Logic  for  Alignment  Shift 


replace  the  two’s  converter. 

The  above  statement  on  two  fractions  (including  the  sign  bit  and 
hidden  1)  addition  indicates,  when  the  result  is  negative,  that  in  two's 
complement  form  should  be  recomplemented  back  to  its  sign-magnitude 
form.  The  Logic  Block  B  consists  of  two  AND  gates,  two  inverters,  and  one 
OR  gate,  its  truth  table,  K  map,  and  the  gate  structure  are  shown  in  Figure  3.6. 
The  output  (signal  :  sel)  of  this  logic  block  B  directs  the  MUX  _5  to  select  the 
fraction  sum  of  two  input  operands,  and  the  nonzero  result  will  be 
normalized  by  an  programmable  logic  array  (PLA)  and  a  shifter  (as  shown  in 
the  Figure  3.7  on  page  28). 

3.  Normalize  the  resulting  sum/difference 

The  fraction  sum /difference  may  result  in  four  possible  cases  as 

follows: 

case  1:  m[12 : 0]  =  lOXXX . 

case  2:  m[12 : 0]  =  IIXXX . 

case  3:  m[12 :  0]  =  OIXXX . 

case  4:  m[12 : 0]  =  OOXXX . 

Where  X  =  0  or  1.  A  check  is  made  to  see  if  m[12]  =  1;  this  is  actually  done  by 
shifting  the  fraction  one  bit  right,  and  adding  one  to  the  result  exponent,  i.e., 
case  1  and  case  2.  Note  that  the  least  significant  bit  being  lost  due  to 
truncation.  If  not,  the  result  is  shifted  left  until  a  nonzero  digit  appears  in  the 
MSB  (i.e.,  m[ll])  position,  and  decreasing  the  resultant  exponent  by  one  for 
each  shift,  i.e.,  case  4.  The  normalization  is  implemented  by  an  PLA  and  a 
shifter.  PLAs  are  implemented  as  two-level  logic  realization  of  a  sum-of- 
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Figtire  3.7  The  Normalizer  of  Floating-point  ADD  Unit 


products  expression.  The  output  signals  can  be  expressed  as  the  sum  (OR)  of 
several  intermediate  signals,  each  of  which  can  be  expressed  as  the  product 
(AND)  of  several  input  signals.  We  use  a  readily  available  PLA  that  is  in  the 
library  of  the  Genesil  Silicon  Compiler.  The  equations  that  determine  the 
logic  of  the  PLA  are  written  in  PLAEQ  (Ref.  17],  the  PLA  specification 
language,  and  contained  in  an  ancillary  file.  This  file  is  parsed  by  the  PLA 
parser  and  can  be  optimized  by  the  optimizer  of  choice.  The  Compiler  uses 
this  parsed  and  optimized  file  to  generate  the  layout,  simulation,  and  timing 
models  [Ref.  17].  The  Ancillary  File  parameter  is  used  to  name  the  PLA 
ancillary  file  that  contains  PLAEQ  coding.  This  coding  specifies  the  input  and 
output  signals  and  their  attributes  and  the  logic  to  be  implemented  by  the 
PLA.  In  the  normalization  of  the  floating-point  addition,  how  far  the  bit  is 
needed  to  shift  to  left  that  is  implemente*’  by  the  PLA  ancillary  file  which  is 
shown  in  below. 


CODEFILE 

INPUT  m[ll]; 
INPUT  m[10]; 
INPUT  m[9]; 
INPUT  m[8]; 
INPUT  m[7]; 
INPUT  m[6]; 
INPUT  m[5]; 
INPUT  m[4]; 
INPUT  m[3]; 
INPUT  m[2]; 
INPUT  m[l]; 
INPUT  m[0]; 
OUTPUT  g[3]; 
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OUTPUT  g[2]; 
OUTPUT  g[l]; 
OUTPUT  g[0]; 

CODING  (ROM) 


<  1 . >  0000 

<01 . >  0001 

<001 . >  0010 

<  0001 . >0011 

<  00001 . >  0100 

<  000001  . >  0101 

<  0000001 . >0110 


<  00000001 . . . .  >  0111, 
<  000000001 . . .  >  1000; 
<  0000000001 . . >  1001; 
<  00000000001  .  >  1010, 
<  000000000001  >  1011, 
<  000000000000  >  1100; 

END 


Design  the  shifter  used  to  shift  the  fraction  bit  to  the  left  (i.e., 
normalization)  that  is  very  similar  to  the  alignment  shift  network,  which  has 
described  in  Figure  3.5  on  page  25,  except  the  shifter  shift  bit  to  the  left. 

Note  that  two  M-bit  fraction  numbers,  when  subtracted,  may  result 
in  a  required  post  normalization  aligrunent  of  M-1  positions  [Ref.  6].  In  the 
event  of  m[12  :  11]  =  01,  the  post  normalization  step  is  skipped,  i.e.,  case  3. 
Step  3  is  shown  in  the  Figvire  3.3. 

4.  Adjusting  the  exponent 

As  step  1  described,  the  signal  (Ae[3])  directs  the  MUX_9  to  select  the 
larger  exponent.  In  step  3,  the  exponent  result  may  be  increased  one. 
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decreased  one  to  eleven,  or  no  change.  Note  that  the  decreasing  operation  is 
executed  by  a  two's  complement  convert. 

After  the  normalization,  if  the  adjusted  exponent  result,  de[3  :  0],  is 
0000  and  the  resultant  fraction  sum /difference  is  case  3  then  the  result  of  this 
floating-point  addition  is  zero  as  far  as  our  resolution  can  determine  (as 
shown  in  the  Figure  2.1  on  page  7).  If  the  resultant  fraction  sum/difference  is 
m[12  :  0]  then  this  indicates  that  the  floating-point  addition  has  resulted  in  a 
value  that  is  a  true  zero  [Ref.  11].  Note  that,  after  adjusting  the  exponent,  the 
result  exponent  (e[3  :  0])  will  be  selected.  Step  4  is  shown  in  Figure  3.8. 

5.  Exponent  overflow  and  underflow 

Overflow  occurs  whenever  one  of  the  incoming  exponents  is  1111 
(the  maximum  value)  and  the  result  of  the  addition  causes  the  exponent  to  be 
incremented.  This  sets  the  exponent  to  0000  which  is  an  indication  of 
overflow. 

Underflow  can  occur  only  when  the  normalization  was  executed  as 
described  in  step  3.  Figure  3.8  shows  the  m[12  : 11]  =  00  ensure  the  left  shifting 
of  fraction  has  happened,  and  the  carry-out,  Cout/  of  Adder_3  is  zero  to 
indicate  the  exponent  exceeds  the  limit  during  true  addition.  For  example, 
assume  the  larger  exponent  is  0001  (i.e.,  2'7),  m[12  :  0]  =  0001000000000,  and 
Cout  =  0.  After  the  left  shifting  of  fraction,  the  resulting  exponent,  2*9,  exceeds 
the  limit  and  shows  that  the  exponent  underflow  has  occurred. 

6.  Setting  of  the  sign  bit 

As  described  in  step  2  and  step  3,  the  resultant  sign  bit  of  floating¬ 
point  addition  is  decided  by  as,  bs,  and  m[12].  Figure  3.9  shows  the  truth  table, 
K  map,  and  the  gate  structure  of  this  logic. 
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Figure  3.8  The  Underflow,  Overflow,  and  Exponent  Operation  of  Floating¬ 
point  ADD  Unit 
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Figure  3.9  The  Logic  Design  for  Sign  Bit  in  ADD  Unit 


33 


C  THE  HARDWARE  DESIGN  OF  FLOATING-POINT  SUBTRACT  UNIT 
The  only  difference  between  floating-point  addition  and  floating-point 
subtraction  is  that  the  sign  bit  of  the  second  operand  (augend /minuend)  will 
be  reversed  (i.e.,  A  +  (-B)  =  A  -  (+B)  or  A  -  (-B)  =  A  +  (+B)).  Note  that  the 
difference  in  the  hardware  configuration  is  an  inverter  that  is  set  in  front  of 
the  sign  bit  of  the  second  operand  (i.e.,  B). 

D.  THE  HARDWARE  DESIGN  OF  FLOATING-POINT  FFT 

As  previously  described  in  Chapter  II,  the  hardware  configuration  of 
floating-point  FFT  (i.e.,  two-point  DFT  with  complex  number  input)  is 
composed  of  one  complex  ADD  (two  real  adder),  one  complex  SUBTRACT 
(two  real  subtracter),  and  one  complex  MULTIPLY  units.  The  complex 
multiply  consists  of  four  real  multiply  operations  and  two  real  add 
operations.  In  this  thesis,  the  simplest  floating-point  FFT  (i.e.,  two-point  DFT 
with  complex  number  input)  has  been  implemented.  The  block  diagram  of 
this  floating-point  FFT  is  shown  in  Figure  3.10.  The  two-point  DFT  is  shown 
in  Figure  2.4  and  Figure  2.5,  which  is  the  basic  processor  for  layer  FFTs,  is 
known  as  the  butterfly.  For  a  detailed  description  of  N-point,  where  N  is  a 
power  of  two,  floating-point  FFT  the  reader  is  refered  to  Reference  7. 
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A  =  al  +  a2  j 


W^=  wl  +  w2  j 


Figure  3.10  The  Block  Diagram  of  Floating-point  FFT 


IV.  DESIGN  VERinCATION 


The  GENESIL  can  simulate  a  digital  design  at  both  the  functional  and 
switch-level  to  verify  design  functionality.  Using  GENESIL,  the  user  specifies 
the  design  functionality  and  the  netlist,  and  builds  the  functional  simulation 
models.  The  simulator  uses  these  models,  which  contain  initialization 
conditions  and  additional  simulation  commands,  to  verify  the  operation  of 
the  design.  The  Simulator  can  be  controlled  directly  on  an  interactive  screen 
interface  or  run  in  batch  mode.  Figure  4.1  illustrates  the  Simulator 
environment  within  GENESIL  [Ref.  13-15].  A  GENESIL  design  (i.e.,  floating¬ 
point  ADD,  SUBTRACT,  MULTIPLY,  and  FFT  units  are  described  in  Chapter 
III)  is  specified.  The  simulation  is  performed  on  functional  models  which 
derived  from  the  block  specifications  and  netlists  [Ref.  15]. 

Four  different  simulation  models  (GFL,  FLATGFL,  FLATSGFL,  and  GSL) 
are  available  on  the  GENESIL  Simulation  Menu  [Ref.  15].  The  GFL  functional 
model  is  a  gate-level  model  used  for  general-purpose  simulation.  This  is  the 
simulation  model  utilized  for  this  thesis.  With  the  GFL  model,  all 
simulation  nodes  are  available,  and  the  design  hierarchy  is  preserved  for  use 
by  Simulator  commands.  For  example,  the  "pi"  command  will  list  the  inputs 
and  outputs  to  the  selected  instance  of  the  block,  module,  chip,  or  chipset  [Ref. 
15]. 

In  the  following  sections,  the  functional  simulation  results  of  floating¬ 
point  MULTIPLY  and  ADD  unit  will  be  described  by  examples.  Note  that 
some  special  cases  are  also  specified. 
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Figure  4.1  The  Simulation  Environment  within  GENESIL 
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A.  THE  SIMULATION  RESULT  OF  FLOATING-POINT  MULTIPLY  UNIT 


Example  1: 

The  floating-point  format,  7.0  can  be  represented  as: 

7.0  =  0  1010  11000000000 

Since  7.0  is  a  positive  number  and  the  sign  bit  is  0.  The  exponent  is  in 
excess-8  so  that  1010  represents  2.  The  fraction  is  1.11  =  1  (hidden  1)  +  2'^  +  2-2 
=  1  +  0.5  +  0.25  =  1.75.  Therefore  this  expression  is  for  a  value  of  1.75  x  22  =  7.0. 
When  simulation  is  executed  on  the  multiplication  of  7.0  by  7.0,  the 
simulation  result  obtained  from  Figure  4.2  is 

W  [15 : 0]  =  0  1101  10001000000 
that  can  be  verified  to  be  the  value  of  1.53125  x  2^  =  49. 

Note  that  x[ll]  =  1  illustrates  that  normalization  has  occurred.  Both  ov  = 
0  and  un  =  0  indicate  that  neither  overflow  nor  underflow  has  occurred.  The 
correct  result  shows  no  truncation  error  in  this  example. 

Example  2: 

Let  A  be  the  multiplier,  and  B  be  the  multiplicand.  The  values  for  A  and 
B  are  as  follows  (both  in  floating-point  format  and  decimal  format): 

A  =  (0 1010  11000100000)2  =  (1.765625  x  22)io 
B  =  (1  1010  11001000000)2  =  (1.78125  x  22)io. 

Simulation  in  GFL  gives  the  result  in  Figure  4.3  as: 

W[15 :  0]  =  1 1101  10010010100 
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that  can  be  verified  to  the  value  of  -1.572265625  x  2  =  -50.3125.  Note  that  the 
correct  product  of  A  multiplied  by  B  is  -50.3203125.  The  difference  value, 
0.0078125  (=  50.3203125  -  50.3125),  is  due  to  tnmcation  error. 

Example  3: 

Let  A  =  1 1100  11000000000  =  (-1.75  x  24)io 

B  =  1 1100  11000000000  =  (-1.75  x  24)io, 
the  correct  product  of  A  multiplied  by  B  is  (784)io. 

Simulation  in  GFL  gives  the  result  in  Figure  4.4  as: 

W[15  :  0]  =  0 1001 10001000000  =  (3.0625)io 
The  ov  =  1  indicates  an  exponent  overflow  has  occurred  and  therefore  the 
value  in  W[15  :  0]  is  useless. 

Example  4: 

Let  A  =  1  1 100  11000000000  =  (-1.75  x  24)io 

B  =  1  1011  11000000000  =  (-1.75  x  23)io, 
the  correct  product  of  A  multiplied  by  B  is  (392)io. 

Simulation  in  GFL  gives  the  result  in  Figure  4.5  as: 

W[15  :  01  =  0 1000  10001000000  =  (1.53125)io. 

The  ov  =  1  indicates  an  exponent  overflow,  which  is  caused  by 
normalization,  has  occurred  and  therefore  the  value  in  W[15  :  01  is  useless. 

Example  5: 

Let 
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A  =  1  0011  00100000000  =  (-1.125  x  2-5)io 
B  =  1  0011  00100000000  =  (-1.125  x  2-5)io, 


the  correct  product  of  A  multiplied  by  B  is  (0.00123596)io. 

Simulation  in  GFL  gives  the  result  in  Figure  4.6  as: 

W[15  :  0]  =  0  0110  01000100000  =  (0.31640625)io. 

The  un  =  1  indicates  an  exponent  underflow  has  occurred  and  therefore 
the  value  in  W[15  :  0]  is  useless.  Note  that  no  normalization  is  needed,  the 
reason  is  x[ll]  =  0. 

Example  6: 

Let  A  =  0  0100  00100000000  =  (1.125  x  2-^)io 

B  =  0  0100  00100000000  =  (1.125  x  2-4)io, 
the  correct  product  of  A  multiplied  by  B  is  (0.00494385)  lo. 

Simulation  in  GFL  gives  the  result  in  Figure  4.7  as: 

W[15  :  0]  =  0  0000  00000000000. 

This  result,  W[15  :  0],  is  forced  to  zero  as  described  in  Chapter  in  on  page 
19.  Note  that  x[ll]  =  0  and  ti=  1  directing  MUX_0  to  select  zero  for  the  result. 

Example  7: 

Let  A  =  0  0000  00000000000 

B  =  01101  10101100000. 

Simulation  in  GFL  gives  the  result  in  Figure  4.8  as: 

W[15  :  0]  =  0  0000  00000000000. 

Figure  3.1  shows  that  t2  =  1  directs  MUX_1  to  select  the  true  zero  result. 
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Figure  4.2  The  Simulation  Result  of  Example  1 
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Figure  4.3  The  Simulation  Result  of  Example  2 
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Figure  4.4  The  Simulation  Result  of  Example  3 
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Figure  4.5  The  Simulation  Result  of  Example  4 
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Figure  4.6  Tlie  Simulation  Result  of  Example  5 
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Figure  4.7  The  Simulation  Result  of  Example  6 
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Module;  ~genluck/luck/FLProult 

- Genesil  Version  v8.0.2 


Functional  Simulator 


0 

b(81 

1 

b(7] 

0 

bl6] 

1 

b(5] 

1 

b(4] 

0 

b(31 

0 

b(2} 

0 

b(l] 

0 

b[0] 

0 

BACK 

CYCLE 

4 

pi 

)  is  of  type  module  with  22  ports 
)  port  1  I  TRUE  to  NC  “  II 
)  port  3  I  FALSE  to  NC  -  l. 

)  port  5  I  a(15;0J  to  NC*16  “  J.l.l.i  l  i.i.i  l.U.I.l  l.l.l. 

)  port  7  I  b(15:01  to  NC*16  -  l.miLllllLllLHIil.LLl.l. 

)  port  9  O  s|l5:0j  to  NC*16  -  001011 03 01 1 00000 

)  port  11  O  ov  to  NC  “  0 

)  port  13  O  cout  to  NC  “  0 

)  port  15  0  W(15i0]  to  NC*16  -  0000000000000000 
)  port  17  O  x[ll:0j  to  NC*12  -  011010110000 
)  port  19  0  LS_OUT(lliO]  to  NC*12  -  000000000000 
)  port  21  O  un  to  NC  “  0 


INSERT 

MESSAGES  GRAPHICS 

OVERIAY 

RECORD  UTILITY 

BACK 

QUERY 

HIER  I.EVEL 

ENVIRONMENT 

SCREENS 

BIND 

CYCLE 

RUN  VECTORS 
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ASSERT 
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UNBIND 

PROPAGATE 

VERIFY  VAI.UE 

Coiiunand: 

>SlMtnATION> 


Figure  4.8  The  Simulation  Result  of  Example  7 
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B.  THE  SIMULATION  RESULT  OF  FLOATING-POINT  ADD  UNIT 


Example  8: 

Let  A  and  B  be  the  two  input  operands  of  the  floating-point  ADD  unit. 
Assume 

A  =  0  1010 11000000000  =  (1.75  x 
and  B  =  0  1010 11000000000  =  (1.75  x  2^)io. 

When  simulation  is  executed  on  the  addition  of  7.0  plus  7.0,  the  sum  of  A 
and  B  is  shown  in  the  Figure  4.9 

Dout  [15 : 0]  =  0  1011 11000000000 

that  can  be  verified  to  be  the  value  of  1.75  x  2^  =  14. 

Note  that  Ae[3  :  0]  =  0000  illustrates  the  exponent  parts  of  two  input 
operands  are  equal.  Thus  no  alignment  is  needed.  Both  ov  and  un  are  zero 
showing  that  neither  an  overflow  or  an  underflow  has  occurred.  The  result 
does  not  suffer  from  truncation  error  in  this  example. 

Example  9: 

Let  A  =  0  1001 11000000010  =  (1.75097656  x  2)io 

B  =  0  1100 11000000000  =  (1.75  x  2*ho, 
the  correct  sum  of  A  plus  B  is  (31.501953125)io. 

Simulation  in  GFL  gives  the  result  in  Figure  4.10  as: 

Dout  [15 : 0]  =  0 1100 11111000000  =  (31.5)io- 


I 
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The  Ae[3]  =  1  illustrates  A<B,  and  A  will  be  aligned.  Note  that  the 
difference  value,  0.001953125  (=  31.501953125  -  31.5)  is  due  to  truncation  error 
that  is  caused  by  alignment. 

Example  10: 

Let  A  =  0  1000  01111110110  =  (1.4951171875)io 

B  =  1  0100  10000100000  =  (- 1.515625  x  2-4)io, 
the  correct  sum  of  A  plus  B  is  (1.400390625)io. 

Simulation  in  GFL  gives  the  result  in  Figure  4.11  as: 

Dout  [15  : 0]  =  0  1000  01100110100  =  (1 .400390625)  lo- 
This  example  shows  a  true  subtraction,  (+A)  +  (-B),  as  described  in 
Chapter  HI  on  page  24.  The  sel  =  1  converts  the  sum  of  fraction  part  (i.e.,  K[12  : 
0])  to  sign-magnitude  form  (i.e.,  m[12 :  0]). 

Example  11: 

Let  A  =  1  0110  01110000100  =  (- 1.439453125  x  2-2)io 

B  =  0  0110  10001000000  =  (1.53125  x  2-2)io, 
the  correct  sum  of  A  plus  B  is  (0.02294921875)io. 

Simulation  in  GFL  gives  the  result  in  Figure  4.12  as: 

Dout  [15 : 0]  =  0  0010  01111000000  =  (1.53125  x  2-2)io. 

The  g[3  :  0]  =  0100  illustrates  that  normalization  has  occurred,  and  shifting 
four  digits  left  is  executed  in  the  final  fraction  part  (i.e.,  f[10  :  0)).  Note  that  the 
exponent  part  (i.e.,  de[3  :  0])  has  adjusted  as  part  of  the  normalization  process. 

Example  12: 
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Let 


A  =  0  0001 11101000000  =  (1.90625  x  2-7)io 
B  =  1  0001  00111111000  =  (- 1.18359375  x  2-7)io, 
the  correct  sum  of  A  plus  B  is  (0.00564575195)io. 

Simulation  in  GFL  gives  the  result  in  Figure  4.13  as: 

Dout  [15 : 0]  =  0 1111 10100100000  =  (210)io. 

The  un  =  1  indicates  an  exponent  imderflow  has  occurred  and  therefore  the 
value  in  Dout[15  :  0]  is  useless. 

Example  13: 

Let  A  =  1  nil  11000000000  =  (- 1.75  x  27)io 

B  =  1  nil  10010000000  =  (- 1.5525  x  27)io, 
the  correct  sum  of  A  plus  B  is  (-  424)io. 

Simulation  in  GFL  gives  the  result  in  Figure  4.14  as: 

Dout  [15 : 0]  =  1  0000  00000000000. 

The  ov  =  1  indicates  an  exponent  overflow  has  occurred  and  therefore  the 
value  in  Dout[15  :  0]  is  useless. 

Example  14: 

Let  A  =  0  0000  00000000000  =  (0)io 

B  =  1  1100  01000100000  =  (- 1.265625  x  2*ho^ 
the  correct  sum  of  A  plus  B  is  (-  20.25)io. 

Simulation  in  GFL  gives  the  result  in  Figure  4.15  as: 

Dout  [15 : 0]  =  1 1100  01000100000  =  (-  20.25)io. 

Note  that  the  true  zero  operand.  A,  which  is  successfully  detected. 
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Example  15: 
Let 


A  =  0  1010 11000000000  =  (1.75  x  24)io 
B  =  1  1010  11000000000  =  (-1.75  x 
the  cxwrect  sum  of  A  plus  B  is  (0)io. 

Simulation  in  GFL  gives  the  result  in  Figure  4.16  as: 

Dout  [15 : 0]  =  0  0000  00000000000  =  (0)io. 
Note  that  this  example  shows  (A)  +  (-A)  =  0. 
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Genesll  Screen  Dump  —  Thu  Apr  11  10:58:46  1991 


******* 

Module: 


k  *****  j 


)  port 
)  port 
)  port 
)  port 
)  port 
)  port 
)  port 
)  port 
)  port 
)  port 
)  port 
)  port 
)  port 
)  port 
)  port 
)  port 
)  port 
)  port 
)  port 
)  port 
)  port 
)  port 
)  port 
)  port 
)  port 
)  port 
)  port 
)  port 
)  port 
)  port 
)  port 
)  port 
)  port 
)  port 
)  port 


************1 
'genluck/luck/FLPadder 

- Genesll  Version  V0.O.2- 

I  TRUE  to  NC  -  H 
I  FALSE  to  NC  -  L 
O  Cout  to  NC  “  0 

O  DoutI15:0)  to  NC*16  -  0101111000000000 
O  Ael3:0]  to  NC*4  -  0000 


Functional  Simulator 


11  O  Be[3:0]  to 
13  O  ae[3:0}  to 
15  Cl  phasea  to 
17  Cl  phase_b  to 
19  O  t2  to  NC  - 
21  0  be [3:0]  to 
23  0  SE[3:0]  to 
25  O  align[ll:0] 

27  O  am[]0:0]  to 
29  0  bm(10:01  to 
31  O  ce[3:0]  to 
33  0  Kfl2:0]  to 
35  0  Bb[12:01  to 
37  I  A[15:01  to 
39  I  B[15:0}  to 
41  O  abs  to  NC  -  0 
43  0  aligned[ll:0] 

45  O  as  to  NC  -  0 

47  0  bs  to  NC  -  0 

49  0  sel  to  NC  -  0 
51  0  sign  to  NC  -  0 
53  0  ffl0:0)  to  NC*1] 

55  0  de(3:0]  to  NC»4  - 
57  O  tl  to  NC  =  0 

59  0  9[3:0]  to  NCM  - 
61  0  m(12:0)  to  NC*13 
63  O  di3:0]  to  NCM  -  0001 
65  O  e[3:0]  to  NC*4 
67  O  ov  to  NC  “  0 

69  0  un  to  NC  -  0 


NC*4  -  0000 
NC»4  -  1010 
NC  “  1 
NC  “  0 

0 

NCM  -  1010 
NC*4  -  0000 

to  NC*12  “  111000000000 
NC*11  -  11000000000 
NC*11  -  11000000000 
NC»4  -  1010 
NC‘13  -  1110000000000 
NC*13  -  0111000000000 
NC*16  -  LHLHLHHLLI.I.LXJ.LL 
NC*16  -  IJILHLU1ILLI,IJ,I.LLL 


to  NC*12  “  111000000000 


-  11000000000 

ion 

0000 

-  1110000000000 


ion 
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>S1MUIATI0N> 


Figure  4.9  The  Simulation  Result  of  Example  8 
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Genesil  Screen  Dujnp  —  Thu  Apr  11  11:17:10  1991 


**AAAAA**AAAA***AA*A***^******************************************************** 


Module; 


~genluck/luck/PLPadder 

- Genesil  Version  v8.0.2 


Functional  Simulator 


) 

port 

1  I 

TRUE  to 

NC  -  H 

) 

port 

3  I 

FALSE  to 

NC 

-  L 

) 

port 

5  0 

1  Cout  to 

NC  -  0 

) 

port 

7  0 

1  DOUtll5:0]  to  NC*16  -  0110011111000000 

) 

port 

9  0 

1  Ae(3:0] 

to  NC»4  -  1101 

) 

port 

11 

0  Bet3:0] 

to 

NC*4  -  0011 

) 

port 

13 

0  ae[3:0j 

to 

NC*4  -  1001 

) 

port 

15 

Cl  phase_a 

to 

NC  -  1 

) 

port 

17 

Cl  phase  b 

to 

NC  -  0 

) 

port 

19 

0  t2  to 

NC  - 

0 

) 

port 

21 

0  be[3:0] 

to 

NC*4  -  1100 

) 

p>ort 

23 

0  SE(3:0] 

to 

NC*4  -  0011 

) 

port 

25 

0  allgn(ll 

:0] 

to  NC*12  -  111000000010 

) 

port 

27 

0  am(10:0] 

to 

NC»11  -  11000000010 

) 

port 

29 

0  bm(10:0] 

to 

NC*11  -  11000000000 

) 

port 

31 

0  cei3:0] 

to 

NC‘4  -  1100 

) 

port 

33 

0  K[12:0i 

to 

NC*13  -  0111111000000 

) 

port 

35 

0  Bb[12:0) 

to 

NC»13  -  0111000000000 

) 

port 

37 

I  A(15:0) 

to 

NC*16  -  LHIXHHHLLLI.LLLI1L 

) 

port 

39 

I  B(15:0] 

to 

NC*16  -  LnHLLHIILI.LLLLU,L 

) 

port 

41 

0  abs  to 

NC  ■ 

-  0 

) 

port 

43 

0  allgned(ll :0]  to  NC*12  -  000111000000 

) 

port 

45 

0  as  to 

NC  “ 

0 

) 

port 

47 

0  bs  to 

NC  “ 

0 

) 

port 

49 

0  sel  to 

NC  ■ 

-  0 

) 

port 

51 

0  sign  to 

1  NC 

-  0 

) 

port 

53 

0  fI10:0] 

to 

NC‘ll  -  11111000000 

) 

port 

55 

0  de(3:01 

to 

NC»4  -  1100 

) 

port 

57 

0  tl  to 

NC  “ 

0 

) 

port 

59 

0  gI3:0) 

to 

NCM  -  0000 

) 

port 

61 

0  m(12i0] 

to 

NC*13  -  0111111000000 

) 

port 

63 

0  d[3:0] 

to 

NC*4  -  0000 

) 

port 

65 

0  e(3:0] 

to 

NCM  -  1100 

) 

port 

67 

0  ov  to 

NC  “ 

0 

) 

port 

69 

0  un  to 

NC  - 

0 
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Figure  4.10  The  Simulation  Result  of  Example  9 


53 


Genesil  Screen  Dump  —  Thu  Apr  11  11:22:04  1991 

******************************************************************************** 
Module:  "genluck/luck/FLPadder  Functional  Simulator 

- Genesil  Version  v8.0.2 - 

)  port  1  I  TRUE  to  NC  -  H 
)  port  3  I  FALSE  to  NC  -  L 
)  port  5  0  Cout  to  NC  -  0 

)  port  7  0  Dout(15;0]  to  NC*16  -  0100001100110100 
)  port  9  0  Ae(3:0]  to  NC*4  -  0100 
)  port  11  O  Be(3:0]  to  NC*4  -  1100 

)  port  13  O  aef3:0j  to  NC*4  -  1000 

)  port  15  Cl  phase_a  to  NC  -  1 

)  port  17  Cl  phase_b  to  NC  -  0 

)  port  190t2  to  NC“0 
)  port  21  O  be(3:0]  to  NC*4  -  0100 
)  port  23  O  SEi3:0]  to  NC*4  -  0100 
)  port  25  O  align(ll:0]  to  NC*12  -  IIOOOOIOOOOO 
)  port  27  0  am(10:0]  to  NC*11  -  OllllllOllO 
)  port  29  O  bm{10:01  to  NC*11  -  10000100000 
)  port  31  O  ce(3:0]  to  NC*4  -  1000 
)  port  33  0  K[12:01  to  NC*13  -  lOlOOllOOl 100 
)  port  35  O  Bb[12:0)  to  NC*13  -  0101111110110 
)  port  37  I  A(  15:0]  to  NC‘16  -  LHLLLLIIHWIIIIIII.HUL 
)  port  39  I  B(15:0i  to  NC*16  -  III.HLUILLUJILLLLL 
)  port  41  O  abs  to  NC  ■=  1 

)  port  43  0  allgned[ll :0]  to  NC*12  -  000011000010 

)  port  45  O  as  to  NC  -  0 

)  port  47  0  bs  to  NC  -  1 

)  port  49  O  sel  to  NC  »  1 

)  port  51  0  sign  to  NC  -  0 

)  port  53  0  f[10:0]  to  NC*11  -  01100110100 

)  port  55  0  de(3:01  to  NCi  -  1000 

)  port  57  O  tl  to  NC  -  0 

)  port  59  0  g(3:0]  to  NC*4  -  0000 

)  port  61  0  mil2:0]  to  NC*13  -  0101100110100 

)  port  63  0  dI3:0]  to  NC*4  -  0000 

)  rort  65  0  e(3:oi  to  NC*4  -  1000 

)  port  67  0  ov  to  NC  “  0 

)  port  69  0  un  to  NC  “  0 
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>S1MULATI0N> 


Figure  4.11  The  Simulation  Result  of  Example  10 
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»»«»****»***»**»****»********»*********»**************************•**'********** 
Genesll  Screen  Dump  —  Thu  Apr  11  11:29:38  1991 

******************************************************************************** 
Module:  ~genluck/luck/FLPadder  Functional  Simulator 

- Genesil  Version  V0.O.2 - 

)  port  1  1  TRUE  to  NC  -  II 
)  port  3  I  FALSE  to  NC  -  L 
)  port  5  0  Cout  to  NC  ~  1 

)  port  7  O  Dout[l5:0]  to  NC*16  -  0001001111000000 
1  port  9  0  Ae[3:0]  to  NC*4  -  0000 
)  port  11  O  Be[3:0)  to  NC‘4  -  0000 

)  port  13  O  ae[3:0]  to  NC»4  -  0110 

)  port  15  Cl  phase_a  to  NC  -  1 

)  port  17  Cl  phase_b  to  NC  -  0 

)  port  19  O  t2  to  NC  -  0 

)  port  21  O  be[3:01  to  NC*4  -  0110 

)  port  23  O  SE(3:0i  to  NC‘4  -  0000 

)  port  25  O  align [11 :0J  to  NC«12  -  110001000000 

)  port  27  O  am[10:0]  to  NC*11  -  01110000100 

)  port  29  O  bm(10:0]  to  NC*11  -  10001000000 

)  port  31  0  cei3:0]  to  NC*4  -  0110 

)  port  33  O  K(12:0]  to  NC*13  -  1111101000100 

)  port  35  0  Bb[12:0]  to  NC*13  -  1101110000100 

)  port  37  I  A[15:0]  to  NC*16  -  HLI1HLLI1HIII.LLI,HLL 

)  port  39  I  Bl  15:0]  to  NC*16  -  LLIIHLlIl.U.IlIXt.LU. 

)  port  41  0  abs  to  NC  -  0 

)  port  43  O  allgned[ll;0]  to  NC‘12  -  110001000000 

)  port  45  0  as  to  NC  “  1 

)  port  47  0  bs  to  NC  -  0 

)  port  49  0  sel  to  NC  -  1 

)  port  51  0  sign  to  NC  -  0 

)  port  53  O  fOO.O]  to  NC‘1  1  -  01111000000 

)  port  55  O  de[3:0]  to  NC*4  -  0010 

)  port  57  O  tl  to  NC  -  0 

)  port  59  O  g(3:0]  to  NC*4  -  0100 

)  port  61  O  m(12.0]  to  NC*13  -  0000010111100 

)  port  63  0  d[3:0]  to  NC*4  -  1100 

)  port  65  0  e{3:0j  to  NC«4  “  0010 

)  port  67  0  ov  to  NC  “  0 

)  port  69  0  un  to  NC  -  0 
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BACK 
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Comma  nd : 
>SIMULATI0N> 


Figure  4.12  The  Simulation  Result  of  Example  11 
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Genesil  Screen  Dump  —  Thu  Apr  li  11:59:42  1991 

******************************************************************************** 
Module:  ~genluck/luck/FLPadder  Functional  Simulator 

- Genesil  Version  v8.0.2 - 

)  port  1  I  TRUE  to  NC  -  n 
)  port  3  I  FALSE  to  NC  -  L 
)  port  5  0  Cout  to  NC  “  0 

)  port  7  O  Dout[15:0]  to  NC*16  -  0111110100100000 
)  port  9  O  Ae[3:0]  to  NC*4  “  0000 
)  port  11  0  Be[3:0]  to  NC*4  =  0000 

)  port  13  O  ae[3:0]  to  NC*4  •=  0001 

)  port  15  Cl  phase_a  to  NC  “  1 

)  port  17  Cl  phase_b  to  NC  -  0 

)  port  19  0  t2  to  NO  “  0 
)  port  21  O  be[3.0]  to  NC»4  -  0001 

)  port  23  O  SE[3:0]  to  NC*4  -  0000 

)  port  25  O  allgn[ll:0]  to  NC*12  -  100111111000 
)  port  27  0  am(10:0]  to  NC*11  -  11101000000 

)  port  29  0  bmil0:01  to  NC‘ll  “  00111111000 

)  port  31  O  ce[3:0)  to  NC*4  -  0001 

)  port  33  O  K(12:0]  to  NC*13  -  1101010111000 

)  port  35  0  Bb[12:0]  to  NC*33  -  0111101000000 
)  port  37IA[15:01  to  NC*16“  LLLI.IimillLllU.1 .1.1.1. 

)  port  39  I  b|15:0]  to  NC*16  -  III.I J.IILI.HII1IIIHULI.L 
)  port  41  O  abs  to  NC  -  1 

)  port  43  O  aligned(ll:0]  to  NC*12  -  lOOllilllOOO 

)  port  45  0  as  to  NC  “  0 

)  port  47  0  bs  to  NC  “  1 

)  port  49  0  sel  to  NC  -  1 

)  port  51  0  sign  to  NC  -  0 

)  port  53  0  f[10:0]  to  NC*ll  -  10100100000 

)  port  55  0  det3:0]  to  NCM  “  1111 

)  port  57  0  tl  to  NC  “  0 

)  port  59  O  g(3i0]  to  NC*4  -  0010 

)  port  61  0  m(12:0]  to  NC*13  -  0010101001000 

)  port  63  O  d[3:0]  to  NC*4  -  1110 

)  port  65  0  e(3:0]  to  NC‘4  -  1111 

)  port  67  O  ov  to  NC  “  0 

)  port  69  0  un  to  NC  -  1 


INSERT 

MESSAGES  GRAPHICS 

OVEI<r.AY 

RECORD  UTILITY 

BACK 

QUERY 

IlIER  LEVEL 

ENVIRONMENT 

SCREENS 

BIND 

CYCLE 

RUN  VECTORS 

SCROr.I. 

FIGHTS 

ASSERT 

STEP 

UNBIND 

PROPAGATE 

VERIFY  VALUE 

Command : 
>S1MULATI0N> 


Figure  4.13  The  Simulalion  Result  of  Example  12 
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***»*****»*»*';******«»**«******«***»»*»**»<»*****«***»************************** 
Gei)esil  Screen  Dump  —  Tliii  Apr  11  12;  03;  18  1991 

******************************************************************************** 
Module:  “genluck/luck/FLPadder  Functional  Simulator 

- Genesil  Version  v8.0.2 - 

)  port  1  I  TRUE  to  NC  -  II 
)  port  3  I  FALSE  to  NC  -  1. 

)  port  5  0  Cout  to  NC  -  1 

)  port  7  0  Dout(15:0J  to  NC*16  -  1000000000000000 
)  port  9  O  Aet3:0]  to  NC*4  -  0000 
)  port  11  0  Be[3:0]  to  NC*4  “  0000 
)  port  130ae(3:0]  to  NC*4“1111 
)  port  15  Cl  phase_a  to  NC  “  1 

)  port  17  Cl  phase_b  to  NC  “  0 

)  port  19  0  t2  to  NC  “  0 

)  port  21  O  be(3:0]  to  NCM  =  1111 

)  port  23  0  SEPiO]  to  NCM  -  0000 

)  port  25  O  align[ll!0]  to  NC*12  -  110010000000 
)  port  27  O  am[10:0]  to  NC*11  -  11000000000 

)  port  29  O  bmllOiO]  to  NC*11  -  10010000000 

)  port  31  O  06(3:0]  to  NC»4  •>  1111 

)  port  33  0  K[12:0]  to  NC*13  “  1101010000000 
)  port  35  O  Bb{12:0]  to  NC*13  -  1111000000000 
)  port  37  I  Af  15:0]  to  NC'*16  ■=  IIHIIllllllllI.I.UJ.II.M. 

)  port  39  1  D(15:0]  to  NC«16  -  IIIIIIIIIIIILMll.I.l.Ll.I.r. 

)  port  41  O  abs  to  NC  -  1 

)  port  43  O  alignedlll :0]  to  NC*12  =  110010000000 

)  port  45  0  as  to  NC  -  1 

)  port  47  O  bs  to  NC  -  1 

)  port  490  sel  to  NC  -  0 

)  port  51  O  sign  to  NC  -  1 

)  port  53  O  f{10:0)  to  NC-n  00000000000 

)  port  55  O  de(3:0]  to  NC*4  -  oooO 

)  port  57  O  tl  to  NC  “  1 

)  port  59  0  g(3:0]  to  NC‘4  “  0000 

)  port  61  O  m(]2:0)  to  NC‘13  =  1 301010000000 

)  port  63  0  d[3:0]  to  NCM  -  0001 

)  port  65  0  e (3:0]  to  NC*4  -=  0000 

)  port  67  0  ov  to  NC  -  1 

)  port  69  0  un  to  NC  -  0 


INSERT 

MESSAGES  GRAPHICS 

OVEIUAY 

RECORD  UTILITY 

BACK 

QUERV 

KIER  LEVEL 

ENVIRONMENT 

SCREENS 

BIND 

CYCLE 

RUN  VECTORS 

SCR  01  ,r. 

FIGHTS 

ASSERT 

STEP 

UNBIND 

PROPAGATE 

VERIFY  VA1.UE 

Conunand : 
>SIMUIATI0M> 


Figure  4.14  The  Simulation  Result  of  Example  13 
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A****************************************'************************************' 

Genesil  Screen  Dump  —  Thu  Apr  13  32:07:28  3991 


*********A**tkA****«*******««************** 


Module:  ~genluck/luck/FLPadder 

- Genesil  Version  v8.0.2 


Functional  Simulator 


)  port  1  I  TRUE  to  NC  -  H 
)  port  3  I  FALSE  to  NC  =  L 
)  port  5  O  Cout  to  NC  “  0 

)  port  7  O  Doutll5:01  to  NC*16  -  1110001000100000 
)  port  9  0  Ae(3:0]  to  NC»4  -  0100 
)  port  31  O  Be(3:0]  to  NCM  -  1100 

)  port  13  O  aei3:0]  to  NC*4  -  0000 

)  port  15  Cl  phase_a  to  NC  -  1 

)  port  17  Cl  phase_b  to  NC  -  0 

)  port  19  O  t2  to  NC  “  1 
)  port  21  0  be[3:0]  to  NCM  -  1100 

)  port  23  O  SE[3:0]  to  NCM  -  0100 

)  port  25  O  align(ll:01  to  NC*12  -  lOlOOOlOOOOO 
)  port  27  0  ani[10:0)  to  NC‘ll  -  00000000000 

)  port  29  0  bmI10:0)  to  NCMl  -  01000100000 

)  port  31  0  ce(3:0]  to  NC*4  “  0000 

)  port  33  O  K(12:0]  to  NCM3  *■  1100010100010 

)  port  35  0  Bb(12:0]  to  NC‘13  -  0100000000000 
)  port  37  I  A[15:0]  to  NC*16  “  LLLIXLLLLI.LLt.Ii.L 

)  port  39  I  B(15:0j  to  NC*16  -  Hi»lLI.LHLLLI3LI.I,LL 

)  port  41  0  abs  to  NC  =  1 

)  port  43  0  allgned[ll:0]  to  NC‘12  -=  000010100030 

)  port  450as  to  NC“0 

)  port  47  0  bs  to  NC  “  1 

)  port  49  0  sel  to  NC  “  1 

)  port  51  0  sign  to  NC  0 

)  port  53  0  f  1 10:0]  to  NC*11  -  JnliDMluOO 

)  port  55  O  del 3:0]  to  NCM  '  11  in 

)  port  57  0  tl  to  NC  -  0 

)  port  59  0  gl3:0]  to  NCM  0010 

)  port  61  0  mll2:0]  to  NCM3  -  0011101011110 

)  port  63  0  d(3:0]  to  NCM  -  1310 

)  port  65  0  e(3:0]  to  NCM  -  1110 

)  port  67  0  ov  to  NC  -  0 

)  port  69  0  un  to  NC  =  1 


INSERT 

MESSAGES  GRAPHICS 

OVERIA.Y 

RECORD  UTILITY 

BACK 

QUERY 

HIER  LEVEL 

ENVIRONMENT 

SCREENS 

BIND 

CYCI.E 

RUN  VECTORS 

SCROLL 

FIGHTS 

ASSERT 

STEP 

UNBIND 

PROPAGATE 

VERIFY  VALUE 

Coiiuiiand : 
>SIMUtATION> 


Figure  4.15  The  Simulation  Result  of  Example  14 
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1titk*****k*ii*t***1t%****t*ii-k1tklitii1i*\tit1tii*A.***A*i.*******************'A******’^****-t*** 

Genesil  Screen  Dump  —  Mon  May  6  16:50:35  1991 


ttt1titk1t*^1i*****1t-k***1t**it***ii*1t1t1t1fk**i*-k-k^******-k***1t*k****-k*********^'k**^**1fk***** 


Module: 


(jenluck/1  uc;k/t'L.l“adder 

- Genes! 1  Version  v8. 0.2- 


Functional  Simulator 


0 

niio) 

1 

n|9i 

1 

8(8:0) 

000000000 

BACK 

CYCLE 

4 

pi 


) 

i  s 

of 

type  module  with  182  i>orts 

) 

port 

1  1 

TRUE  to 

NC  “ 

II 

) 

port 

3  I 

FAI,SE  to 

NC 

-  L 

) 

port 

5  C 

1  Cout  to 

NC  - 

0 

) 

port 

7  C 

1  Dout(15:0] 

tc 

1  NC*16  -  0000000000000000 

) 

port 

9  0 

1  Ae(3:01  to  NC*4  -  0000 

) 

port 

11 

0  Be(3:01 

to 

NC*4  -  0000 

) 

port 

13 

0  ae[3:0] 

to 

NC*4  -  1010 

) 

port 

15 

Cl  phase_a 

to 

NC  -  1 

) 

poi  t 

17 

CT  pliase  b 

to 

NC  “  0 

) 

port 

19 

0  t2  to  NC  - 

0 

) 

port 

21 

0  be(3:01 

to 

NC‘4  -  1010 

) 

port 

23 

0  SE13:0i 

to 

NCM  =  0000 

) 

por  L 

25 

0  align (11 : 

01 

to  NC*12  -  111000000000 

) 

port 

27 

0  am(10:0) 

to 

NC*]1  -  11000000000 

) 

port 

29 

0  bin(]0:0) 

to 

NC‘11  -  11000000000 

) 

port 

31 

0  ce  j  3 : 0) 

to 

NCM  1010 

> 

port 

33 

0  0(15:0) 

to 

NC‘16  -  0101011000000000 

) 

poi  t 

35 

0  K(  12:0) 

to 

MC*13  "  0000000000000 

) 

port 

37 

0  Ubll2:0) 

to 

NC*]3  -  0111000000000 

) 

por  t 

39 

0  nlll:01 

to 

NCM2  -  000000000000 

) 

por  t 

41 

I  A(]5:01 

to 

NCM  6  -  LIII,HI.nm.U.U.M.I.L 

) 

por  L 

43 

1  8(15:01 

to 

mcmc  -  iiin.iiMiiii.u.i.Lbi,u. 

> 

poi  l 

45 

O  abs  to 

NC  “  1 

in.SF.RT 

MESSAGES  GRAPH  ITS 

OVERIVW 

RECORll  UTIl.ri'Y 

BACK 

QUERY 

IIIER  LEVEL 

ENVIRONMENT 

SCREENS 
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CYCLE 

RUN  VECTORS 

SCROLL 

FIGHTS 

ASSERT 

STEP 

UNftIHD 

PROPAGATE 

VERIFY  VAI.UE 

Command: 


Figure  4.16  Tlte  Simulation  Result  of  Example  15 
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V.  CONCLUSIONS  AND  RECOMMENDATIONS 


A.  CONCLUSIONS 

The  purpose  of  this  thesis  was  to  design  the  floating-point  hardware  of 
the  multiplier,  adder  and  subtracter  for  FFT  operation.  The  Genesil  Silicon 
Compiler  system  was  used  to  overcome  the  shortcomings  of  the  time 
consuming  custom  graphical  layout  methods.  The  Genesil  Silicon  Compiler 
system  (V  8.0.2),  currently  licensed  at  the  Naval  Postgraduate  School, 
provides  IC  designers  with  the  capability  to  extend  the  Genesil  Silicon 
Compiler  Library  with  fully  parameterized  cells  that  work  with  Genesil 
verification  and  floorplanning  tools  [Ref.  13].  The  Genesil  Silicon  Compiler 
system  greatly  increases  the  ability  to  verify  the  designs  and  aids  in  wiring. 

A  16  bit  reduced  word  size  floating-point  arithmetic  unit  for  high  speed 
signal  analysis  was  implemented  in  this  thesis.  The  design  algorithm  fcr 
floating-point  arithmetic  units  is  introduced  in  Chapter  II;  the  algorithm  also 
supports  the  hardware  design  of  floating-point  arithmetic  units  that  are 
described  in  Chapter  III.  Other  efforts  of  this  thesis  are  the  layout  verification, 
functional  simulation  result,  and  Timing  Analysis  (TA)  [Ref.  16]  of  the 
floating-point  units.  The  validated  designs  can  be  used  to  develop  the  high 
speed,  pipelined  floating-point  FFT  units. 

Table  5.1  shows  the  timing  analysis  of  the  floating-point  MULTIPLY  unit 
and  ADD  unit.  Note  that  in  Table  5.1  the  worst  case  delay  of  the  floating¬ 
point  MULTIPLY  unit  is  64.4  ns,  while  the  delay  of  the  ADD  unit  is  215.8  ns. 
That  is,  the  maximum  clock  rates  are  about  15.5  MHz  for  multiplier  and  4.6 
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MHz  for  adder.  Table  5.1  also  shows  the  floating-point  ADD  unit  has  greater 
area  than  the  floating-point  MULTIPLY  unit. 


TABLE  5.1  THE  OUTPUT  DELAY  AND  SIZE  FOR  FLOATING¬ 
POINT  MULTIPLY  AND  ADD 


Floating- 
Point  Unit 

Output  Delay 

Size(Mils) 

Fabline 

min(ns) 

max(ns) 

height 

width 

MULTIPLY 

3.4 

64.4 

101.4 

101.5 

VTC_10PE 

ADD 

21.7 

215.8 

122.3 

430.8 

VTC_10PE 

B.  RECOMMENDATIONS 

Notice  that  the  maximum  output  delay  of  the  adder  illustrates  the 
operating  frequency  is  not  high  enough.  To  speed  up  the  adder  and  to  balance 
clock  rates  is  a  task  to  be  investigated.  The  goal  of  the  designer  is  to  make  the 
floating-point  adder  as  fast  as  the  multiplier.  Two  possible  improvements  are 

•  Rearrange  or  redesign  the  hardware  of  floating-point  ADD  imit,  and 

•  Pipeline  the  floating-point  ADD  unit. 

Note  that  redesign  of  the  hardware  of  floating-point  ADD  unit  could  be  a 
difficult  task.  This  requires  a  through  study  of  the  delay  bottleneck. 
However,  based  on  our  experience,  the  marginal  improvement  is  slight  and 
the  natural  approach  is  to  pipeline  the  adder  at  the  expense  of  silicon  area. 
When  pipelined,  the  ADD  unit  would  have  approximately  four  times  the 
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number  of  stages  as  the  MULTIPLY  unit.  It  is  recommended  that  pipelined 
designs  be  developed  following  the  basic  work  in  this  thesis. 

In  this  thesis,  if  the  value  is  e  =  2*8  and  f^O  (see  Figure  2.1  on  page  7)  then 
this  value  of  the  result  will  be  forced  to  zero.  If  the  value  is  ^  <  2‘8  then  we 
only  set  the  underflow  flag.  The  designer  can  redesign  the  circuit  so  that 
when  the  exponent  underflow  occurs,  the  output  value  will  be  forced  to  true 
zero.  Now,  a  true  zero  will  be  forced  as  output  when  the  actual  exponent 
would  have  been  less  than  or  equal  to  2*8.  The  underflow  flag  should  be  set 
in  both  cases. 

Special  purpose,  reduced  word  size,  floating-point  multipliers  and  adders, 
are  useful  units  for  high  speed  signal  processing,  particularly  in  FFT  units. 
Therefore  the  work  should  be  continued. 
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